2,463 research outputs found
Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora
Much of scientific progress stems from previously published findings, but
searching through the vast sea of scientific publications is difficult. We
often rely on metrics of scholarly authority to find the prominent authors but
these authority indices do not differentiate authority based on research
topics. We present Latent Topical-Authority Indexing (LTAI) for jointly
modeling the topics, citations, and topical authority in a corpus of academic
papers. Compared to previous models, LTAI differs in two main aspects. First,
it explicitly models the generative process of the citations, rather than
treating the citations as given. Second, it models each author's influence on
citations of a paper based on the topics of the cited papers, as well as the
citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS,
and Citeseer. We compare the performance of LTAI against various baselines,
starting with the latent Dirichlet allocation, to the more advanced models
including author-link topic model and dynamic author citation topic model. The
results show that LTAI achieves improved accuracy over other similar models
when predicting words, citations and authors of publications.Comment: Accepted by Transactions of the Association for Computational
Linguistics (TACL); to appea
Non-Linear Editor for Text-Based Screencast
Screencasts, where computer screen is broadcast to a large audience on the
web, are becoming popular as an online educational tool. Among various types of
screencast content, popular are the contents that involve text editing,
including computer programming. There are emerging platforms that support such
text-based screencasts by recording every character insertion/deletion from the
creator and reconstructing its playback on the viewer's screen. However, these
platforms lack rich support for creating and editing the screencast itself,
mainly due to the difficulty of manipulating recorded text changes; the changes
are tightly coupled in sequence, thus modifying arbitrary part of the sequence
is not trivial. We present a non-linear editing tool for text-based
screencasts. With the proposed selective history rewrite process, our editor
allows users to substitute an arbitrary part of a text-based screencast while
preserving overall consistency of the rest of the text-based screencast.Comment: To appear in Adjunct Proceedings of the 30th Annual ACM Symposium on
User Interface Software & Technology (UIST 2017, Poster
Time-Aware Representation Learning for Time-Sensitive Question Answering
Time is one of the crucial factors in real-world question answering (QA)
problems. However, language models have difficulty understanding the
relationships between time specifiers, such as 'after' and 'before', and
numbers, since existing QA datasets do not include sufficient time expressions.
To address this issue, we propose a Time-Context aware Question Answering
(TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE)
task, and build a time-context dependent data generation framework for model
training. Moreover, we present a metric to evaluate the time awareness of the
QA model using TCSE. The TCSE task consists of a question and four sentence
candidates classified as correct or incorrect based on time and context. The
model is trained to extract the answer span from the sentence that is both
correct in time and context. The model trained with TCQA outperforms baseline
models up to 8.5 of the F1-score in the TimeQA dataset. Our dataset and code
are available at https://github.com/sonjbin/TCQAComment: 2023 EMNLP Finding
- …